Global Deaths Due to Air Pollution

Elizabeth Bekele, Alison Cheek

2022-05-03

Introduction

Packages Required

#This will allow us to filter through our data 
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary 
library(knitr)
library(kableExtra)
#This expands our plot uses 
library(plotly)
#Scientific Notation Disabled 
options(scipen=999)

Deaths Data

Import the deaths-due-to-air-pollution data

deaths_df <- data.frame(read.csv("death-rates-from-air-pollution.csv"))

We are going to rename a few of the columns and glimpse the data

colnames(deaths_df) <- c("country", "acronym", "year", "total_deaths", "indoor_deaths", "outdoor_deaths", "ozone_deaths")

glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country        <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist~
## $ acronym        <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",~
## $ year           <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1~
## $ total_deaths   <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0~
## $ indoor_deaths  <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9~
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36~
## $ ozone_deaths   <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739~

Data Variables

Variables that interest us here include:

World Population Data

Now, let’s take a look at the population data.

world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "~
## $ Year         <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196~
## $ Count        <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,~

To get a general idea of ‘deaths-dataframe’ we made, let’s make a plots to see what’s happening. This is a plot of indoor x outdoor deaths around the world by country.

This is a mess, and so we chose two countries from each continent (a high-population and a low-population country) to graph.

We selected a high population from each continent and used the formula below to determine the low population.

Low population = high population * .10

Country.Name Year Count
Australia 1997 18517000
Brazil 1997 167209040
Germany 1997 82034771
Nigeria 1997 113457663
Pakistan 1997 131057431
United States 1997 272657000
Australia 1998 18711000
Brazil 1998 169785250
Germany 1998 82047195
Nigeria 1998 116319759
Pakistan 1998 134843233
United States 1998 275854000
Australia 1999 18926000
Brazil 1999 172318675
Germany 1999 82100243
Nigeria 1999 119260063
Pakistan 1999 138624621
United States 1999 279040000
Australia 2000 19153000
Brazil 2000 174790340
Germany 2000 82211508
Nigeria 2000 122283850
Pakistan 2000 142343578
United States 2000 282162411
Australia 2001 19413000
Brazil 2001 177196054
Germany 2001 82349925
Nigeria 2001 125394046
Pakistan 2001 145978402
United States 2001 284968955
Australia 2002 19651400
Brazil 2002 179537520
Germany 2002 82488495
Nigeria 2002 128596076
Pakistan 2002 149549700
United States 2002 287625193
Australia 2003 19895400
Brazil 2003 181809246
Germany 2003 82534176
Nigeria 2003 131900631
Pakistan 2003 153093373
United States 2003 290107933
Australia 2004 20127400
Brazil 2004 184006481
Germany 2004 82516260
Nigeria 2004 135320422
Pakistan 2004 156664697
United States 2004 292805298
Australia 2005 20394800
Brazil 2005 186127103
Germany 2005 82469422
Nigeria 2005 138865016
Pakistan 2005 160304008
United States 2005 295516599
Australia 2006 20697900
Brazil 2006 188167356
Germany 2006 82376451
Nigeria 2006 142538308
Pakistan 2006 164022627
United States 2006 298379912
Australia 2007 20827600
Brazil 2007 190130443
Germany 2007 82266372
Nigeria 2007 146339977
Pakistan 2007 167808105
United States 2007 301231207
Australia 2008 21249200
Brazil 2008 192030362
Germany 2008 82110097
Nigeria 2008 150269623
Pakistan 2008 171648986
United States 2008 304093966
Australia 2009 21691700
Brazil 2009 193886508
Germany 2009 81902307
Nigeria 2009 154324933
Pakistan 2009 175525609
United States 2009 306771529
Australia 2010 22031750
Brazil 2010 195713635
Germany 2010 81776930
Nigeria 2010 158503197
Pakistan 2010 179424641
United States 2010 309326085
Australia 2011 22340024
Brazil 2011 197514534
Germany 2011 80274983
Nigeria 2011 162805071
Pakistan 2011 183340592
United States 2011 311580009
Australia 2012 22733465
Brazil 2012 199287296
Germany 2012 80425823
Nigeria 2012 167228767
Pakistan 2012 187281475
United States 2012 313874218
Australia 2013 23128129
Brazil 2013 201035903
Germany 2013 80645605
Nigeria 2013 171765769
Pakistan 2013 191262919
United States 2013 316057727
Australia 2014 23475686
Brazil 2014 202763735
Germany 2014 80982500
Nigeria 2014 176404902
Pakistan 2014 195306825
United States 2014 318386421
Australia 2015 23815995
Brazil 2015 204471769
Germany 2015 81686611
Nigeria 2015 181137448
Pakistan 2015 199426964
United States 2015 320742673
Australia 2016 24190907
Brazil 2016 206163058
Germany 2016 82348669
Nigeria 2016 185960289
Pakistan 2016 203627284
United States 2016 323071342
Australia 2017 24601860
Brazil 2017 207833831
Germany 2017 82657002
Nigeria 2017 190873311
Pakistan 2017 207896686
United States 2017 325147121
Country.Name Year Count
Canada 1997 29905948
Chile 1997 14786220
Sri Lanka 1997 18470900
Malawi 1997 10264906
New Zealand 1997 3781300
Serbia 1997 7596501
Canada 1998 30155173
Chile 1998 14977733
Sri Lanka 1998 18564599
Malawi 1998 10552338
New Zealand 1998 3815000
Serbia 1998 7567745
Canada 1999 30401286
Chile 1999 15162800
Sri Lanka 1999 18663284
Malawi 1999 10854322
New Zealand 1999 3835100
Serbia 1999 7540401
Canada 2000 30685730
Chile 2000 15342353
Sri Lanka 2000 18777601
Malawi 2000 11148758
New Zealand 2000 3857700
Serbia 2000 7516346
Canada 2001 31020902
Chile 2001 15516113
Sri Lanka 2001 18911730
Malawi 2001 11432000
New Zealand 2001 3880500
Serbia 2001 7503433
Canada 2002 31360079
Chile 2002 15684409
Sri Lanka 2002 19062482
Malawi 2002 11713664
New Zealand 2002 3948500
Serbia 2002 7496522
Canada 2003 31644028
Chile 2003 15849652
Sri Lanka 2003 19224037
Malawi 2003 12000181
New Zealand 2003 4027200
Serbia 2003 7480591
Canada 2004 31940655
Chile 2004 16014971
Sri Lanka 2004 19387153
Malawi 2004 12301838
New Zealand 2004 4087500
Serbia 2004 7463157
Canada 2005 32243753
Chile 2005 16182721
Sri Lanka 2005 19544988
Malawi 2005 12625952
New Zealand 2005 4133900
Serbia 2005 7440769
Canada 2006 32571174
Chile 2006 16354504
Sri Lanka 2006 19695972
Malawi 2006 12973699
New Zealand 2006 4184600
Serbia 2006 7411569
Canada 2007 32889025
Chile 2007 16530195
Sri Lanka 2007 19842044
Malawi 2007 13341806
New Zealand 2007 4223800
Serbia 2007 7381579
Canada 2008 33247118
Chile 2008 16708258
Sri Lanka 2008 19983984
Malawi 2008 13727890
New Zealand 2008 4259800
Serbia 2008 7350222
Canada 2009 33628895
Chile 2009 16886186
Sri Lanka 2009 20123508
Malawi 2009 14128155
New Zealand 2009 4302600
Serbia 2009 7320807
Canada 2010 34004889
Chile 2010 17062536
Sri Lanka 2010 20261737
Malawi 2010 14539612
New Zealand 2010 4350700
Serbia 2010 7291436
Canada 2011 34339328
Chile 2011 17233576
Sri Lanka 2011 20398670
Malawi 2011 14962112
New Zealand 2011 4384000
Serbia 2011 7234099
Canada 2012 34714222
Chile 2012 17400347
Sri Lanka 2012 20425000
Malawi 2012 15396005
New Zealand 2012 4408100
Serbia 2012 7199077
Canada 2013 35082954
Chile 2013 17571507
Sri Lanka 2013 20585000
Malawi 2013 15839269
New Zealand 2013 4442100
Serbia 2013 7164132
Canada 2014 35437435
Chile 2014 17758959
Sri Lanka 2014 20778000
Malawi 2014 16289540
New Zealand 2014 4509700
Serbia 2014 7130576
Canada 2015 35702908
Chile 2015 17969353
Sri Lanka 2015 20970000
Malawi 2015 16745303
New Zealand 2015 4595700
Serbia 2015 7095383
Canada 2016 36109487
Chile 2016 18209068
Sri Lanka 2016 21203000
Malawi 2016 17205289
New Zealand 2016 4693200
Serbia 2016 7058322
Canada 2017 36540268
Chile 2017 18470439
Sri Lanka 2017 21444000
Malawi 2017 17670260
New Zealand 2017 4793900
Serbia 2017 7020858

Combine Data Sets

First let’s look at a table of the high and low populated countries using the world population data set.

## # A tibble: 6 x 3
## # Groups:   Year [1]
##   Country.Name   Year     Count
##   <chr>         <int>     <int>
## 1 Australia      1997  18517000
## 2 Brazil         1997 167209040
## 3 Germany        1997  82034771
## 4 Nigeria        1997 113457663
## 5 Pakistan       1997 131057431
## 6 United States  1997 272657000
## # A tibble: 6 x 3
## # Groups:   Year [1]
##   Country.Name  Year    Count
##   <chr>        <int>    <int>
## 1 Canada        1997 29905948
## 2 Chile         1997 14786220
## 3 Sri Lanka     1997 18470900
## 4 Malawi        1997 10264906
## 5 New Zealand   1997  3781300
## 6 Serbia        1997  7596501

Next, we are going to see the death count for high and low populated countries using the deaths dataframe.

## # A tibble: 6 x 7
## # Groups:   year [6]
##   country   acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>     <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Australia AUS      1997         22.4         0.322           21.8        0.314
## 2 Australia AUS      1998         21.5         0.284           21.0        0.305
## 3 Australia AUS      1999         20.4         0.259           19.9        0.295
## 4 Australia AUS      2000         19.4         0.240           18.9        0.290
## 5 Australia AUS      2001         18.6         0.223           18.1        0.284
## 6 Australia AUS      2002         18.1         0.211           17.7        0.286
## # A tibble: 6 x 7
## # Groups:   year [6]
##   country acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>   <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Canada  CAN      1997         21.9        0.0878           19.9         2.20
## 2 Canada  CAN      1998         21.7        0.0824           19.6         2.21
## 3 Canada  CAN      1999         21.2        0.0751           19.2         2.19
## 4 Canada  CAN      2000         20.3        0.0682           18.3         2.13
## 5 Canada  CAN      2001         19.8        0.0641           17.9         2.08
## 6 Canada  CAN      2002         19.5        0.0605           17.7         2.05

Lastly, we will join the population and and deaths with its respected country.

## # A tibble: 6 x 8
## # Groups:   year [6]
##   country   acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>     <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Australia AUS      1997         22.4         0.322           21.8        0.314
## 2 Australia AUS      1998         21.5         0.284           21.0        0.305
## 3 Australia AUS      1999         20.4         0.259           19.9        0.295
## 4 Australia AUS      2000         19.4         0.240           18.9        0.290
## 5 Australia AUS      2001         18.6         0.223           18.1        0.284
## 6 Australia AUS      2002         18.1         0.211           17.7        0.286
## # ... with 1 more variable: Count <int>
## # A tibble: 6 x 8
## # Groups:   year [6]
##   country acronym  year total_deaths indoor_deaths outdoor_deaths ozone_deaths
##   <chr>   <chr>   <int>        <dbl>         <dbl>          <dbl>        <dbl>
## 1 Canada  CAN      1997         21.9        0.0878           19.9         2.20
## 2 Canada  CAN      1998         21.7        0.0824           19.6         2.21
## 3 Canada  CAN      1999         21.2        0.0751           19.2         2.19
## 4 Canada  CAN      2000         20.3        0.0682           18.3         2.13
## 5 Canada  CAN      2001         19.8        0.0641           17.9         2.08
## 6 Canada  CAN      2002         19.5        0.0605           17.7         2.05
## # ... with 1 more variable: Count <int>

Death Count

Which country has the highest death count?

Let’s make a table depicting the high and low populated countries and their respected death count due to pollution.

country average_death_high
Australia 17.76815
Brazil 48.42928
Germany 28.10988
Nigeria 112.30157
Pakistan 144.33463
United States 26.35827
country average_death_low
Canada 18.18542
Chile 36.51321
Malawi 147.77167
New Zealand 15.92536
Serbia 80.66558
Sri Lanka 69.60383

Here’s a graph to clearly visualize the previous table

So we’ve looked at the deaths due to pollution, but what percentage of the population was affected?

Country.Name average_population
Australia 21217772
Brazil 189132292
Germany 81914540
Nigeria 148549958
Pakistan 168525322
United States 300447600
Country.Name average_population
Canada 33029774
Chile 16555805
Malawi 13605376
New Zealand 4214995
Serbia 7345882
Sri Lanka 19824652

Pollution Types

Which type of pollution has the greatest number of deaths?

## # A tibble: 6 x 4
##   country       avg_indoor avg_outdoor avg_ozone
##   <chr>              <dbl>       <dbl>     <dbl>
## 1 Australia          0.249        17.2     0.360
## 2 Brazil            19.4          26.8     2.74 
## 3 Germany            0.717        25.5     2.34 
## 4 Nigeria           75.9          35.2     2.12 
## 5 Pakistan          87.7          50.5    10.4  
## 6 United States      0.166        22.8     3.92
## # A tibble: 6 x 4
##   country     avg_indoor avg_outdoor avg_ozone
##   <chr>            <dbl>       <dbl>     <dbl>
## 1 Canada          0.0651        16.4    1.97  
## 2 Chile           8.69          27.2    0.850 
## 3 Malawi        132.            13.8    3.39  
## 4 New Zealand     0.291         15.6    0.0728
## 5 Serbia         35.9           42.7    2.94  
## 6 Sri Lanka      44.5           24.8    0.430

Pollution Over Time

Let’s look at the previous two decades and compare the death count Has there been a change?

This is the first decade 1996-2006
country High_Deaths_96 High_Deaths_01 High_Deaths_06
Australia 23.04465 18.58572 14.92239
Brazil 60.67757 49.46436 41.46829
Germany 34.72325 28.38756 23.83654
Nigeria 136.08978 123.05129 102.26653
Pakistan 155.42988 151.25352 146.09296
United States 29.99271 28.93114 25.93369
country Low_Deaths_96 Low_Deaths_01 Low_Deaths_06
Australia 22.18101 19.82451 14.92239
Brazil 46.36829 37.43188 41.46829
Germany 183.14179 165.41702 23.83654
Nigeria 93.44700 83.18333 102.26653
Pakistan 85.28997 72.16239 146.09296
United States 100.66078 95.27073 25.93369
This is the second decade 2007-2017
country High_Deaths_07 High_Deaths_12 High_Deaths_17
Australia 14.92140 12.65973 10.79595
Brazil 40.42460 35.39069 30.32108
Germany 23.45850 20.91536 19.82826
Nigeria 98.90306 84.22324 81.22147
Pakistan 143.81724 133.93887 123.21548
United States 25.11756 21.98194 18.82515
country Low_Deaths_07 Low_Deaths_12 Low_Deaths_17
Canada 16.93196 13.82968 10.71662
Chile 30.53130 27.31475 24.29921
Malawi 132.12253 116.27470 104.93508
Serbia 76.65752 72.77354 62.57853
Sri Lanka 66.05987 59.22433 38.46264
Tonga 87.81178 79.49336 70.72940

Let’s graph the previous tables!

The first decade.

This shows the second decade.

Which year had the worst indoor? Outdoor particulate? Outdoor ozone?

Indoor Deaths

Outdoor Deaths

Ozone Deaths

Which is worse?

outdoor or indoor pollution?

Let’s reintroduce a graph we looked at earlier. Instead this time we will combine the pollutant types together.

We cannot conclude which is worse.

We have this included already

#Mean total deaths from 1996-2017 of high-population countries
deaths_highpop_countries <- deaths_df %>% 
  filter(country %in% c('United States', 'Brazil', 'Nigeria', 'Germany', 'Pakistan', 'Australia')) %>% 
  group_by(country) %>% 
  select(total_deaths) %>% 
  summarize(average_death_high = mean(total_deaths))
## Adding missing grouping variables: `country`
#Mean total deaths from 1990-2017 of high-population countries
deaths_lowpop_countries<- deaths_df %>% 
  filter(year> 1995 & country %in% c('Canada', 'Chile', 'Malawi', 'Serbia', 'Sri Lanka', 'New Zealand')) %>% 
  group_by(country) %>% 
  select(total_deaths) %>% 
  summarize(average_death_low = mean(total_deaths))
## Adding missing grouping variables: `country`
#death_lowpop_countries
kable(list(deaths_highpop_countries, deaths_lowpop_countries))
country average_death_high
Australia 17.76815
Brazil 48.42928
Germany 28.10988
Nigeria 112.30157
Pakistan 144.33463
United States 26.35827
country average_death_low
Canada 16.86963
Chile 32.58415
Malawi 140.50830
New Zealand 14.08771
Serbia 78.12194
Sri Lanka 65.51438
ggplot(deaths_highpop_countries)+
  geom_col(mapping = aes(x=country, y=average_death_high))+
             xlab("Country")+
             ylab("Average deaths (per 100,000)")+
             ggtitle("Average total deaths in high-population countries")+
  coord_flip()

ggplot(deaths_lowpop_countries)+
  geom_col(mapping = aes(x=country, y=average_death_low))+
             xlab("Country")+
             ylab("Average deaths (per 100,000)")+
             ggtitle("Average total deaths in low-population countries")+
  coord_flip()

Summary

Sources